Martin Sköld1, 2

1 Department of Environmental Research and Monitoring, Swedish Museum of Natural History
2 Department of Mathematics, Stockholm University

Introduction

The Swedish population of brown bear (Ursus arctos) is surveyed in each of four regions every fifth year. The survey is based on volunteers (mainly hunters) collecting scat samples that are sent for genetic analysis to the Swedish Museum of Natural History. For each successfully identified sample, we record

  • Animal ID
  • Sex
  • Date found
  • Geographical coordinate
  • Collector name

and the aim is to use capture-recapture methods to make inference about abundance. Capture-recapture methods are efficient when each animal is captured with the same probability. When there is unexplained individual heterogeneity in capture probability, results tend to be very sensitive to essentially arbitrary model assumptions (e.g. Link (2003)). Given the opportunistic nature of data-collection, geographical variation in effort is likely to be a source of heterogeneity. Here I will investigate to what extent this can be explained by sample coordinates.

Objectives

  1. Construct a tool (R-package) to investigate how survey effort varies geographically.
  2. Can it be used to explain individual heterogeneity in capture probability?

Methods

In the ecological literature, capture-recapture methods revolves around the concepts of visits (to describe the temporal aspect of data collection) and traps/detectors (to describe the geographical locations). When (as in the current context) these are absent, they are often constructed in an artificial manner by aggregation or imputation. I believe this to be a distraction. In other scientific fields, simple urn models that allow repeated sampling of individuals are more common (similar to the capwire R-package sometimes used by ecologists). Here I will use that, under perfect random sampling with replacement, the number of “captures” per individual can be modeled using a Poisson-distribution. The proportion of the full population that are not discovered can then be directly associated with \(p_0=\exp(-\mu)\), the probability that a Poisson(\(\mu\)) random variable attains the value zero. Here \(\mu\) is the average number of captures per individual.

Geographical variation

Survey effort is intimately connected to \(\mu\). We can accommodate geographical variation in effort by allowing this to depend on location, \(\mu=\mu(x, y)\). An individual with activity center at \((x_i, y_i)\) (here associated with the center point of sample locations) is then captured a Poisson(\(\mu(x_i, y_i)\)) number of times. Since we only observe animals captured at least once, the observed distribution will be zero-truncated. There is a wide range of statistical techniques for fitting \(\mu(x_i, y_i)\). I choose to fit penalised thin-plate splines as implemented by the mgcv package (Wood (2003)), an approach that has become popular in ecology over the past decade.

Population density

Given locations of captured animals, it is straightforward to estimate their density (activity centers per unit area). Much more straightforward than if we only had the locations of traps (as in e.g. Efford (2022)). I choose again to use mgcv based on a Poisson-model of gridded counts to fit the density \(\lambda(x, y)\) of observed animals. The density of the full population (observed and unobserved) is then obtained as \(\lambda(x, y)/(1 - \exp(-\mu(x, y)))\).

R package

An R-package (R Core Team (2022)) popdensity has been written to simplify fitting the models and visualising results.

Results

I illustrate the package by fitting the models to data on female bears from the 2021 survey in Norrbotten county. The number of captures per individual show a substantial overdispersion with respect to the Poisson distribution, which I attribute to individual heterogeneity (Figure 1).

Observed frequencies of number of captures per individual (bars) form the 2021 survey compared to a fitted Poisson distribution (dots). A: Using histogram and B: using hanging rootogram.

Figure 1: Observed frequencies of number of captures per individual (bars) form the 2021 survey compared to a fitted Poisson distribution (dots). A: Using histogram and B: using hanging rootogram.

Figures 2 and 3 illustrates the spatially varying effort by mean number of captures \(\mu(x, y)\) and capture probability \(1 - \exp(-\mu(x, y))\), here the latter depends rather crucially on the ability of \(\mu(x, y)\) to capture all individual heterogeneity in data.

# Sample session generating Figures 2-6
library(popdensity)
# region shapefile
norrbotten <- pd_SWE_counties("Norrbotten")
# bears2021 contains id and sample locations 
fit2021 <- pd_fit(data = bears2021, region = norrbotten)
figure2 <- plot(fit2021, type = "mean_captures")
figure3 <- plot(fit2021, type = "p_capture")
figure4 <- plot(fit2021, type = "pop_density")
figure5 <- plot(fit2021, type = "pop_density_se")
figure6A <- pd_rootogram(fit2021, type = "effort")
figure6B <- pd_rootogram(fit2021, 
                         type = "apparent_intensity")
Mean number of captures per female bear, Norrbotten county 2021

Figure 2: Mean number of captures per female bear, Norrbotten county 2021

Probability of capturing a female bear, Norrbotten county 2021

Figure 3: Probability of capturing a female bear, Norrbotten county 2021

Population density (females per square 10km), Norrbotten county 2021

Figure 4: Population density (females per square 10km), Norrbotten county 2021

In Figure 4, the capture probability have been combined with the density of observed animals to form population density. Since high-density areas have relatively high capture probability, this is visually very similar to the density of observed individuals. To give an impression of uncertainty involved, I have in Figure 5 illustrated the standard error of the population density estimate. Not surprisingly this is dominated by the mountain areas in the northwest where no samples are found. Data alone can not distinguish between low effort and low density in this area without further model assumptions.

Standard error of estimated population density (females per square 10km), Norrbotten county 2021

Figure 5: Standard error of estimated population density (females per square 10km), Norrbotten county 2021

Did I succed in explaining individual heterogeneity?

Figure 6A shows a rootogram similar to Figure 1B when geographical variability in effort is accounted for, The results are not very encouraging, overdispersion is only marginally reduced. Hence, Figure 3 is likely to overestimate capture probability and Figure 4 underestimate density.

Rootograms corresponding to the fit of $\mu$ (A) and $\lambda$ (B). While (B) does not indicate any problems, (A) barely improves on Figure 1B.

Figure 6: Rootograms corresponding to the fit of \(\mu\) (A) and \(\lambda\) (B). While (B) does not indicate any problems, (A) barely improves on Figure 1B.

A possible explanation is that effort is very “patchy” on the spatial scale, and this is over-smoothed by the fitted model. For example, it is not uncommon that several scat samples from the same individual are found along a path by an observer. This calls for more complex models of the observation process or data aggregation by e.g. only allowing one capture per individual and date/week.

What about RovQuant?

The RovQuant group builds their model around the framework of Efford (2022) with imputed traps/detectors. This makes computations very demanding. As far as the published version in Bischof et al. (2020), they focus more on describing the underlying population processes than issues with the observation process. Hence, while they may improve on other aspects, their method will suffer from similar problems with individual heterogeneity as discussed here.

Conclusions

Given sufficient effort and geographical coverage, the survey provides a firm lower bound for population size or density through the number of identified individuals. While this lower bound can to some extent be improved by statistical methods, estimates of population size or density should be interpreted with care due to the opportunistic nature of data.

Scan for package and nation-wide maps

Figure 7: Scan for package and nation-wide maps

References

Bischof, Richard, Cyril Milleret, Pierre Dupont, Joseph Chipperfield, Mahdieh Tourani, Andrés Ordiz, Perry de Valpine, et al. 2020. “Estimating and Forecasting Spatial Population Dynamics of Apex Predators Using Transnational Genetic Monitoring.” PNAS 117 (48): 30531–38.
Efford, Murray. 2022. secr: Spatially Explicit Capture-Recapture Models.
Link, William A. 2003. “Nonidentifiability of Population Size from Capture-Recapture Data with Heterogeneous Detection Probabilities.” Biometrics 59 (4): 1123–30.
R Core Team. 2022. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing.
Wood, S. N. 2003. “Thin-Plate Regression Splines.” Journal of the Royal Statistical Society (B) 65 (1): 95–114.

Spatial effort and density in Swedish population monitoring of the brown bear